Word Class Discovery For Postprocessing Chinese Handwriting Recognition
نویسنده
چکیده
This article presents a novel Chinese class n-gram model for contextual postprocessing of haudwriting recognition results. The word classes in the model are automatically discovered by a corpus-based simulated anuealing procedure. Three other language models, least-word, word-frequency, and the powerflfl interword character bigram model, have been constructed for comparison. Extensive experiments on large text corpora show that the discovered class bigram model outperforms the other three competing models.
منابع مشابه
Incorporating diverse information sources in handwriting recognition postprocessing
This paper describes the proposed implementation of a new model for the linguistic postprocessing component of the Human Language Technology (HLT) project. The model was designed for handwriting recognition applications but can be used for other text recognition problems and speech recognition. We demonstrate here that the current implementation (the POS model) fails to incorporate new sources ...
متن کاملRecognition of Cursive Roman Handwriting - Past, Present and Future
This paper reviews the state of the art in off-line Roman cursive handwriting recognition. The input provided to an off-line handwriting recognition system is an image of a digit, a word, or more generally some text, and the system produces, as output, an ASCII transcription of the input. This task involves a number of processing steps, some of which are quite difficult. Typically, preprocessin...
متن کاملA New Method for Rotation Free Online Unconstrained Handwritten Chinese Word Recognition: A Holistic Approach
Most online handwriting word recognition (HWR) approaches proceed by segmenting words into isolate characters which are recognized separately. Inspired by results in cognitive psychology, holistic word recognition approaches provides another effective way to deal the problem of HWR. In this paper, we propose a new method for rotation free online unconstrained Chinese word recognition through a ...
متن کاملThe Postprocessing of Optical Character Recognition Based on Statistical Noisy Channel and Language Model
The techniques of image processing have been used in optical character recognition (OCR) for a long time. The recognition method evolved from early "pattern recognition" to "feature extraction" recently. The recognition rate is raised from 70% to 90%. But the character by character recognition technique has its limitation. Using language models to assist the OCR system in improving recognition ...
متن کاملEvaluation of weighted Fisher criteria for large category dimensionality reduction in application to Chinese handwriting recognition
To improve the class separability of Fisher linear discriminant analysis (FDA) for large category problems, we investigate the weighted Fisher criterion (WFC) by integrating weighting functions for dimensionality reduction. The objective of WFC is to maximize the sum of weighted distances of all class pairs. By setting larger weights for the most confusable classes, WFC can improve the class se...
متن کامل